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ABSTRACT 

Least  squares  problems  arise  when  one  attempts  to  fit  a 
model  y = tj(x,0)  to  points  (yx,  xl),...r(y  # xn)  • Solutions  to 
such  problems  are  obtained  by  minimizing  the  sum  of  squared  devi- 
e ations,  over  an  admissible  region.  This  paper  discusses  the  basic 
theory  of  optimization  for  a general  objective  function  and  applies 
this  material  to  both  linear  and  nonlinear  least  squares  problems. 
In  linear  least  squares,  normal  equations  for  both  the  full  rank 
and  less  than  full  rank  cases  are  considered  and  the  Kuhn-Tucker 
conditions  are  used  to  obtain  the  normal  equations  under  linear 
inequality  constraints.  In  nonlinear  least  squares  different 
iterative  methods  which  may  be  used  to  obtain  a solution  are  dis- 
cussed. The  methods  considered  are  steepest  descent,  Newton- 
Raphson,  Gauss-Newton,  Hartley's  modified  Gauss-Newton,  and  that 
of  Marquardt.  Results  are  obtained  which  relate  Marquardt's 
method  to  equality  constrained  linear  least  squares. 
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1.  OPTIMIZATION 
1.1  Iteration  Procedures 

We  are  primarily  concerned  with  optimizing,  i.e.  maximizing 
or  minimizing,  an  objective  function  f(t)  = f(tj,...,t  ) over  an 

r 

. P 

admissible  region  K c e . 

A general  iteration  procedure  is  as  follows: 

(i)  choose  a seed  point  t e K, 

(ii)  on  the  basis  of  local  behavior  of  f ( • ) at  t select  a new 
iterate  At  e K, 

(iii)  replace  t by  At  and  return  to  step  (ii) . 

A is  a mapping  of  K into  itself.  The  properties  of  such  an 
iteration  scheme  are  the  properties  of  A. 

The  following  are  examples  of  some  cor  teration  procedures 

on  EP. 

Example  1.  Steepest  Descent 

A descent  direction,  s,  at  the  point  t,  is  a unit  vector  such 

that 

^ f(t  + Xs)  |A=()  = sT  Vf  (t)  <0; 

hence,  any  descent  direction  must  be  within  tt/2  of  the  negative 
gradient,  -Vf(t). 

A general  descent  procedure  is : 

(i)  choose  a seed  point  t e Ep, 

(ii)  at  t,  select  a descent  direction  s and  a step  length  X, 


' 


(iii)  return  to  ii) , replacing  t by 


At  = t + Xs. 


The  steepest  descent  technique  chooses  as  its  descent  direc- 
tion, -Vf(t).  This  is  a locally  optimal  choice  in  that 


d f (t  + Xs) 


= sTVf(t) 


is  minimized  for  s proportional  to  -Vf(t) 

Hie  corresponding  technique  for  maximization  is  the  method  of 
steepest  ascent.  See  Nobel  (1969,  p.403)  for  a discussion  of 
convergence. 


Example  2.  Newton's  Method 

Newton's  method  is  designed  to  solve  h(t)  = 0;  where  h is  a 
continuously  differentiable,  real  valued  function  of  a real  vari- 
able. The  mapping  which  defines  Newton's  method,  say  Nt*,  is 

dh  ft*} 

obtained  by  solving  ' (t  - t*)  + h(t*)  = 0 for  t.  That  is, 

we  are  to  find  the  intercept  of  the  tangent  to  the  curve  deter- 
mined by  h,  at  the  point  (t*,  h (t *) ) ; hence. 


Nt  = t - h (t)  / 


dh  (t) 


at 

While  Newton's  method  is  designed  specifically  to  solve  non- 
linear equations,  it  can  be  used  to  obtain  a critical  point  of  an 
objective  function  f and  thus  a potential  solution  to  the  optimiz- 
ation problem.  Specifically,  to  solve  f' (t)  = 0 by  Newton's 
me  t hod , we  h a vo 


* 


“ ' - 




Example  3.  Newton-Raphson  method 


The  Newton-Raphson  method  is  a p-dimensi  mal  analog  of  Newton's 
d.  Let  t e EP  and  h(t)  = (h,  (t) , . . . ,h  (t) ) T;  assume  that 


We  obtain  Nt*  by  finding  the  point  of  intersection  of  the  tangent 
planes  (1.1)  with  the  plane  z = 0.  This  yields 


When  the  Newton-Raphson  method  is  employed  to  find  critical 
points  of  an  objective  function  f we  get  the  matrix  equation 


(Vf(t)) 


The  convergence  of  a general  iteration  procedure  is  of  con- 
siderable importance.  A condition  on  A sufficient  to  guarantee 
convergence  is  given  by  Kolmogorov  and  Fomin  (1957,  p.  43).  Let 
R be  a metric  space  with  metric  P . A mapping,  A,  of  R into  itself 
is  a contraction  if  there  exists  a (0  < a < 1)  such  that 
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P(At,  At*)  5 ap(t,  t*)  for  all  t,  t*  e R. 


Every  contraction  mapping  defined  in  a complete  metric  space 
R has  one  and  only  one  fixed  point.  (i.e.  the  equation  At  = t 
has  one  and  only  one  solution) . 

Furthermore,  Kolmogorov  and  Fomin's  proof  of  this  Theorem 
implies  that,  given  a seed  point  t,  the  sequence  t.  At,  A2t,... 
converges  to  the  fixed  point. 

Theorem  1.2 

Let  f be  defined  on  [a,  b].  if  f' (a)  < 0 < f ' (b)  and 

0 < ka  ^ f"  (t)  s;  k2  on  [a,  b]  then  the  equation  f'  (t)  = 0 has  a 

unique  solution  in  (a,  b)  and  the  mapping  At  = t - Af ' (t)  is  a 

contraction  for  0 < A < k” 1 . 

2 

Proof 

f'  is  continuous  and  strictly  increasing  with  f' (a)  < 0 < f' (b) . 
Therefore  f ' (t)  = 0 has  a unique  solution  in  (a,  b) . 

If  t„ , tj  e [a,  b]  then 

At0  - ht1  = [1  - Af"(S)](t0  - tj);  a s K * b. 

In  particular,  for  t e [a,  b]  and  0 < A < k~ 1 , 

2 

At  - a « [1  - Af" (£) ] (t  - a)+  Aa  - a > 0. 

Similarly  b > At.  Finally, 


I At#  - At,  | = |1  - Af"U)||t0  t,|  < (1  - Ak,)|t0  - tx|. 


: 


t 


1 
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Therefore  A is  a contraction  on  [a,  b]. 

As  a coral lary  we  see  that,  under  the  conditions  of  the  Theorem, 
the  method  of  steepest  descent  (with  fixed  step  length)  converges 
to  a unique  minimum  of  f. 

1.2  Optimization  with  Constraints 

The  mathematical  programming  problem  is  as  follows: 

Given  the  numerical  functions  f,  gx,...,  gm  defined  on  EP,  find 
a point  t of  EP  satisfying 

g.(t)  2 0,  j = 1, . . . , m 

and  such  that  f(t)  is  as  large  as  possible.  A solution  of  the 
problem  will  be  denoted  by  £.  Minimization  problems  may  be  handled 
by  taking  -f(»)  as  the  objective  function. 

The  inequalities  g^ (t)  > 0 are  called  the  constraints  of  the 
program;  points  which  satisfy  the  constraints  are  feasible  points 
and  the  set  of  feasible  points  is  the  feasible  region,  denoted  by 
K.  Throughout  this  chapter  it  will  be  assumed  that  f and 
gJf...,  gm  are  differentiable. 

Two  well-known  special  cases  are  i)  when  f and  g^...  ' *m 
are  linear,  we  have  the  linear  programming  problem  and  ii)  when 
f is  a quadratic  form  and  gm  are  linear,  we  have  the 

quadratic  programming  problem. 

Kuhn  and  Tucker  (1951)  developed  a set  of  conditions,  the 
K-T  conditions  which,  under  mild  regularity  conditions  are  neces- 
sary for  a solution  to  the  programming  problem.  Their  conditions 

u such  that 
m 

. — J 


are  that  there  exist  u2 , u2 , . . . , 
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9j  (^)  ^ 0 j j = 1 , « • • , in 
u.  ^ 0;  j *=  1 , . . . , m 

uj*9j(t)  =0;  j = 1,..  .,  m 

and 

m 

Vf  (t)  + E u.  Vg (t)  = 0. 
j=l  3 3 

There  are  several  sets  of  such  regularity  conditions  which 
can  be  employed.  Here  we  do  not  try  for  the  most  general  results 
but  we  give  conditions,  involving  generalized  concavity,  which  are 
easily  understood  and  yet  quite  general. 

Differentiable  functions  with  the  property  that 

f(y)  > f(x)  implies  Vf(x)»(y  - x)  > 0 

(increasing  function  implies  positive  slope)  are  called  pseudo- 
concave by  Mangasarian  (1965) . 

In  his  unpublished  1953  notes,  Convex  Cones,  Sets  and  Functions, 
W.  Fenchel  treats  the  concept  of  quasi- concavity.  A real  valued 
function  f(x)  having  convex  domain  is  called  quasi-concave 
(q-concave)  if  f(Xx  + Xy)  i min(f(x),  f(y))  whenever  0 < X < 1 and 
X — 1 - X.  For  differentiable  functions,  pseudo-concave  implies 
q-concave . 

Several  alternative  characterizations  of  q-concave  functions 
are  availab le. 

Theorem  1 , 3 

The  following  conditions  are  equivalent 

■ . ■ - — ^ | u^MUm****** 1 1 MV  nw i 


we  see  for  example,  that  convexity  of  the  "constraint  set" 
g.  (x)  £ t,,  g2  (x)  ;>  T,f...g  (x)  ;>  x } is  assured  for  all 
. . ,Tm  when  and  only  when  , g2 , . . . , ,g  are  q- concave  functions 
We  may  now  state  the  following  results,  see  Mangasarian  (1969) 


Theorem  1.  4 


wxixch  are  not  affine 


ming  problem  then  there  exists  u = (Uj,...,  u ) such  that  t and 
u satisfy  the  K-T  conditions. 

Generalized  concavity  can  also  provide  a framework  within 


which  the  K-T  conditions  are  sufficient 


the  Kuhn-Tucker  conditions  then  t solves  the  mathematical  pro 
gramming  problem;  that  is  t = t. 

As  an  example  consider  optimizing  a quadratic  objective 


function  subject  to  linear  inequality  constraints.  This  is  often 
called  quadratic  programming.  In  general,  we  would  have  the 
following  problem: 


Minimize 
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It  t 

j x Fx  - x d 


Subject  to  G x 2 b 

where  F and  G are  matrices,  with  F symmetric,  while  b and  d are 
vectors. 

The  K-T  conditions, 

GTx  £ b,  u 2;  0 


uT (GTx  - b)  = 0 


Gu  = Fx  - d 

are  necessary  for  a solution  to  the  problem  (see  Theorem  1.4). 

If  F is  positive  semidefinite  then,  the  objective  function  is  con- 
cave and  Theorem  1.5  tells  us  that  the  K-T  conditions  are  also 
sufficient. 

We  have 


j xTFx  - xTd  = ^-(x  - x)TF(x  - c)  + constant 

for  some  c,  if  and  only  if  xTd  = xTFc  for  all  x.  This  last  is 
equivalent  to  the  equation  Fc  = d which  has  a solution,  c,  when 
and  only  when  d is  in  the  column  space  of  F.  Thus  the  quadratic 
programming  problem  can  be  written  in  the  special  form:  minimize 
j(x  - c)TF(x  - c)  suject  to  GTx  2 b if  and  only  if  d is  in  the 
column  space  of  F.  In  particular,  the  positive  definite  quadratic 
programming  problem  can  always  be  written  in  this  form. 


* : 
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2.  LEAST  SQUARES 
2.1  Linear  Least  Squares 

Least  Squares  problems  arise  when  one  attempts  to  fit  a model 
y = n(x,B)  to  points  (yx,  xx)  , . . . , (yn,  xj  . Here  n is  a function 
of  known  form,  8 is  a parameter  ;o  be  estimated,  and  x is  a vector 
of  independent  variables.  We  obtain  the  fitted  model  by  minimizing 

n 

(2.1)  <M3)  = 2 [y.  - n(x. ,B)]2 

i=l  1 1 

I 

A 

with  respect  to  8,  a solution  being  denoted  by  8- 

The  deviations  d^ ( 8 ) = y.  - n(x^,  8);  i = 1,...,  m,  or 
d = Y - n (6)  , where  n (8)  = (n ,8) , . . . , n (x  ,8) J , measure  the 
goodness  of  fit  of  the  model  y = n(x,B)  at  the  parameter  value  8. 

A 

The  deviations  d^(8);  i = 1,...,  n are  called  the  residuals . 

\ ■ 

If  ri(x,B)  = xB  then  we  have  the  special  case  of  linear  least 
squares,  and  (2.1)  becomes 

n 

(2.2)  <f>(6)  = E ty.  - x.Bl2  • 

i~l  1 

T 

Let  X be  the  matrix  whose  ith  row  is  x^,  Y = (y1--*yn)^  and  V be 
the  column  space  of  X,  then  by  minimizing  (2.2)  we  are  finding  a 

0 such  that  X0  is  the  vector  in  V that  is  the  closest  to  Y.  Thus, 

* 

8 solves  the  linear  least  squares  problem  if  and  only  if  X8  is 
the  projection  of  Y on  to  V.  From  the  projection  theorem,  X0  is 
that  vector  such  that  we  may  write  Y = X0  + (Y  - x£)  with  X0  e V 
and  y - XB  orthogonal  to  V;  hence,  XT(Y-X0)  =0  or 

(2.3)  * X8  = X'Y. 


We  have  seen  that  if  0 solves  the  linear  least  squares  problem 
then  it  solves  the  normal  equations.  Suppose  that  0 satisfies  the 
normal  equations,  then  XTXB  = XTY;  thus  XT (Y  - X8)  = 0.  There- 
fore XB  is  the  projection  of  Y on  V;  hence,  0 solves  the  linear 
least  squares  problem.  Thus  0 solves  the  linear  least  squares 
problem  iff  B solves  the  normal  equations. 

If  X is  of  full  rank  then  XTX  is  nonsingular;  hence,  the 
normal  equations  have  a unique  solution 

0 = (XTX)_1  XTY. 

However,  if  X is  not  of  full  rank  then  there  will  exist  infinitely 
many  solutions;  a general  solution  to  (2.3)  will  be  given  by 

B = (XTX)~XTY  + ^(XTX)“(XTX)  - ijz, 

where  Z is  arbitrary  and  (X  X)  denotes  the  generalized  inverse 
of  XTX,  see  Rao  (1965,  p.  24). 

We  now  briefly  consider  linear  least  squares  with  constraints. 

To  this  end  we  first  reformulate  the  un cons trained  problem; 

Minimize  (Y  - n)T(Y  - n)  subject  to  n = XB  or  n e V. 

Alternatively  we  may  take  the  feasible  region  to  be  the  space 

orthogonal  to  the  space  orthogonal  to  V.  Let  Pn  be  a 

T 

basis  for  the  space  orthogonal  to  V and  let  G = (P  pn • ~ 

T 

p - P ).  The  constraints  can  be  written  G if  * o. 

r+1  n 

If  in  addition  we  have  the  constraints 


WTn  * b 


then  our  problem  becomes 


and  multiplying  by  X we  obtain 


These  are  the  generalized  normal  equations  to  be  solved  for  B 
and  Q. 

The  following  results  are  of  interest  for  nonlinear  estimation, 
see  Marquardt  (1963).  However,  they  are  actually  theorems  in 
constrained  linear  least  squares,  so  we  present  them  here  and 
return  to  them  later.  Consider  the  programming  problem: 


minimize  | |y  - XB | | , with  respect  to  3 


The  Kuhn-Tucker  conditions  are  necessary  and  sufficient  for 


T T 

There  exists  an  orthogonal  matrix  S such  that  S X X S = D 


diag  (dlf...,  d^) . Since  u > 0,  3 
Writing  SXTY  * V = (vi  *•••»  v )T 


from  which  the  truth  of  the  theorem  is  evident 


Let  a be  the  angle  between  (3^  and  X Y then  a is  strictly 
decreasing  as  a function  of  u and  a tends  to  0 as  u tends  to  » 


ly  increasing  function  of  u and  that  cos  a tends  to  1 as  u 


using  Swartz's  inequaltiy,  d cos  a/du  > 0 


(2.5) 


r 


ipiyupppupf gl JJIj  U^'U1  11 :' 


^"wMdffwfT*  ' ***■>••&■  "y*,i 


ttWi*i*4Sn ?**i**?} 
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Re  la  ted  to  the  above  two  results  is  the  expansion 

(2.5)  Bu  = u*TY  - u“2(XTX)XTY  + u"*(XTX)2  XTY 

valid  for  u greater  than  the  maximum  characteristic  root  of  XTx. 

This  is  obtained  from  the  geometric  expansion  for  matrices 

(M  + I)"1  = I - M + M2  - . . . 

, see  for  example  Friedman  (1956) . 

2.2  Nonlinear  Least  Squares 

Nonlinear  least  squares  problems  arise  when  one  attempts  to 
fit  a model  y = n(*/B)  with  n nonlinear  in  6. 

We  first  make  a general  observation  about  residuals.  If 
n(x,  5)  is  of  the  form  n(*/B)  = + i|)(x;  B2r»..*  Bp)  then 

*£  = -2  ^[Yi-Ba-MMx.;  B2,...,  Bp)  1 . 

n 

Equating  this  to  0 we  get  Z d.  (B)  = 0;  that  is,  the  residuals 

i=l  1 

sum  to  0. 

Explicit  solutions  will  usually  not  be  available  in  the  nonlin- 
ear case  and  one  must  resort  to  iterative  minimization  techniques. 
We  now  present  four  iteration  methods  specifically  adapted  to  the 
nonlinear  least  squares  problem. 

Steepest  descent 

The  gradient  of  (2.1)  is 


V4>  = -2X(B)  ly  - n (B)  ] 


Gauss  Newton  method 


The  Gauss-Newton  method  is  an  iteration  procedure  which 
assumes  local  linearity  of  n(x,*)  about  0 to  obtain  the  new  iterate 
G0.  The  equation  of  the  tangent  plane  to  the  surface  determined 
by  n(x,«) , at  the  point  0*,  is 


Replacing  n by  its  linear  approximation  reduces  th 
previously  considered  case  of  linear  least  squares 
wish  to  minimize 


That  is  we 


with  respect  to  6,  or 


with  respect  to 


t rtr 
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6 = 6 - 3*.  This  is  equivalent  to  projecting  d(3*)  onto  the 
column  space  of  X(B*).  The  solution  is  given  by  solving  the  normal 
equations 

X(0*)TX(B*)6  - X(B*)Td(B*). 


Replacing  3*  by  3 and  6 by  GB  - 3 we  get 


GB  = 3 + [X(B)TX(3)]“  XT(3) d(3) 


There  exist  examples  with  well  behaved  functions  n(x^,B)  for 
which  the  Gauss-Newton  iteration  will  not  converge  no  matter  how 
good  the  starting  value.  However,  Jennrich  (1969)  gives  four 
conditions  which  are  collectively  sufficient  so  that  such  diffi- 
culties are  not  likely  to  arise  when  n,  the  sample  size,  is  large 
and  the  starting  value  is  close  to  the  true  parameter  value,  3. 

One  of  the  sufficient  conditions  just  mentioned  is  that  the 
parameter  space  is  a contact  subset  of  Euclidean  space.  Hence,  in 
using  the  results  obtained  one  would  want  to  restrict  the  investi- 
gation to  some  closed  and  bounded  subset  of  the  parameter  space. 


! 


Hartley's  Modified  Gauss-Newton  Method 


In  considering  the  Gauss-Newton  method  we  notice  that,  given 
3,  there  is  input  of  information  from  the  objective  function,  <|>, 
concerning  the  choice  of  the  next  iterate  only  through  a quadradic 
approximation  and  it  is  possible  for  the  value  of  the  objective 
function  to  actually  increase  by  iteration.  This  increase  would 
not,  in  itself,  invalidate  the  procedure  but  it  could  cause  slow 
convergence.  Hartley  (1961)  modifies  the  Gauss-Newton  method  to  allow 
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greater  input  from  the  objective  function  in  the  iteration  proce- 
dure. This  input  is  obtained  by  making  the  following  assumptions: 


(a) 

(b) 


the  parameter  space,  fl,  is  a convex  subset  of  Ep; 
3n(x.,8)  3 2n  (x.  ,6) 

i . „ ■ ; exist  and  are  continuous  for  all 

3Bk  3^aek 


i*=l, . . • , n and  .£,  k— 1, . . • , p* 


(c)  there  exists  a bounded  convex  subset,  S,  of  the  parameter 
space  such  that  for  every  8 e S and  every 

n Ip  3 D ( 

y = (Ui»***»  yp)  ' * I r^i^k 

is  equivalent  to  requiring  that  X(B)  be  of  full  rank  in 
S) ; 

(d)  there  exists  8°  in  the  interior  of  S such  that 

4 (8°)  < inf  4(3). 

8eSc 

Hartley's  algorithm  is  as  follows: 

(i)  choose  8 = B°  as  starting  vector; 

(ii)  obtain  another  estimate  G8  by  the  usual  Gauss-Newton 

method  (the  existance  of  G8  is  guaranteed  by  assumption 
<c)); 

(iii)  consider  <M*B  + (l-X)GB),  Xe(0,  1],  and  let  X*e[0,  1] 

be  such  that  min  4(XB  + (1-X)G8)  * 4>  ( X *6  + (1-X*)GB) 

Xe (0, 1] 

(from  (b)  <f>(8)  is  continuous  and  hence  4>(XB  + (l-X)GB) 


Xi  , 6 ) \ 2 

— i 1 > 0 (this 

) 


mm 
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obtains  a minimum  on  [0,  1]); 

(iv)  replace  0 by  H0  = A*0  + (1-A*)G0  and  return  to  (ii) . 

Hartley  argues  that  given  a sequence,  {H^$°),  constructed  by 
this  algorithm,  there  exists  a subsequence,  {H^k00},  converging  to 
a point  0 which  is  a critical  point  of  <J>(0).  If#  in  addition  to 
the  above  assumptions,  the  Hessian  of  <j>(0)  is  positive  definite 
on  S,  then  0 is  the  unique  minimum  of  4>(0). 

The  Marquardt  method 

Marquardt's  algorithm  for  the  solution  of  nonlinear  least 
squares  problems  is  a compromise  between  the  Gauss-Newton  and 
steepest  descent  methods,  the  objective  of  this  compromise  being 
the  avoidance  of  problems  associated  with  the  two  methods. 

Let  us  first  review  the  major  difficulties  attributed  to  the 
use  of  the  Gauss-Newton  and  steepest  descent  methods.  First, 
steepest  descent  does  not  specify  the  step  length.  Second,  if  the 
level  sets  of  <f>  tend  to  be  elongated  then  the  method  of  steepest 
descent  will  converge  rapidly  for  the  first  few  iterates  and  then 
oscillate  about  the  axis  of  elongation  taking  smaller  and  smaller 
steps  as  the  iterates  approach  the  minimum.  This  is  because  the 
correction  for  0 obtained  in  steepest  descent  is  perpendicular  to 
the  level  set  at  0;  hence,  for  points  close  to  the  minimum,  the 
correction  vector  may  be  almost  perpendicular  to  the  direction  of 
‘ the  minimum  if  the  level  sets  are  greatly  elongated.  The  main 

problem  encountered  with  the  Gauss-Newton  method  is  lack  of  conver 
gence  of  the  iteration  if  the  starting  points  are  a long  way  from 
the  minimum. 


I 
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One  possible  solution  to  these  problems  is  to  use  the  steepest 
descent  method  for  the  first  few  iterations  and  then  switch  to  the 
Gauss-Newton  method  when  the  progress  becomes  slow.  Marquardt's 
method  is  one  way  in  which  this  can  be  done. 

As  in  the  Gauss-Newton  method,  assume  local  linearity  of 
n(x;*)  at  the  point  6.  Theorem  2.1  then  states  that  the  issue 
concerning  the  choice  of  step  length  can  be  restated  in  terms  of 
a La  Grange  multiplier  u.  More  specifically,  least  squares  sub- 
ject to  a constraint  on  the  maximum  step  length  leads  to  the 


equation 

[XT(3)X(3)  + ul]5u  = XT(3) d(3) 

where  is  the  correction  to  3 and  d(3)  = Y - n(3).  Prom  (2.5) 
we  get,  for  large  u, 

5u  « u"1  XT(3) [Y  - n (3) ] 

T 

But  x (3)  [Y  - rj  ( 3 ) ] is  the  direction  of  steepest  descent  and  (j>  is 


continuous  so  that  for  u sufficiently  large 


(2.6) 


<t»(6u  + 3)  < <t>(3) 


Marquardt  (1963)  recommends  that  the  next  iterate  say  M3  be  given 
by  Mg  = g + 6u  where  u is  chosen  just  large  enough  to  satisfy  (2.6) 
Thus,  in  outline,  6q  is  the  correction  given  by  the  Gauss- 
Newton  method  while  for  large  u,  6^  is  in  the  direction  of  steepest 
descent.  Thus  3 + $u  determines  a continuous  curve  on  which 
Marquardt's  method  interpolates  between  the  Gauss-Newton  and  steep- 
est descent  methods. 


This  manuscript  is  the  joint  work  of  the  author  and 
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